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Abstract 

Self-sustaining autocatalytic chemical networks represent a necessary, though 
not sufficient condition for the emergence of early living systems. These net- 
works have been formalised and investigated within the framework of RAF the- 
ory, which has led to a number of insights and results concerning the likelihood 
of such networks forming. In this paper, we extend this analysis by focussing 
on how small autocatalytic networks are likely to be when they first emerge. 
First we show that simulations are unlikely to settle this question, by estab- 
lishing that the problem of finding a smallest RAF within a catalytic reaction 
system is NP-hard. However, irreducible RAFs (irrRAFs) can be constructed 
in polynomial time, and we show it is possible to determine in polynomial time 
whether a bounded size set of these irrRAFs contain the smallest RAFs within 
a system. Moreover, we derive rigorous bounds on the sizes of small RAFs 
and use simulations to sample irrRAFs under the binary polymer model. We 
then apply mathematical arguments to prove a new result suggested by those 
simulations: at the transition catalysis level at which RAFs first form in this 
model, small RAFs are unlikely to be present. We also investigate further the 
relationship between RAFs and another formal approach to self-sustaining and 
closed chemical networks, namely chemical organisation theory (COT). 

Keywords: Catalytic reaction system, random autocatalytic network, origin 
of life 



"Individual chemical reactions in living beings are strictly coordinated 
and proceed in a certain sequence, which as a whole forms a network 
of biological metabolism directed toward the perpetual self-preservation, 
growth, and self-reproduction of the entire system under the given envi- 
ronmental conditions" Oparin (1965) [26] 
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1. Introduction 

A chemical reaction system that is self-sustaining and collectively autocat- 
alytic is believed to represent an important step in the emergence of early life 
[9, 10, 19, 20]. These systems are defined by two properties: (i) each molecule 
can be built up from a small subset of pre-existing 'food' molecules by some 
reaction in the system, and (ii) each reaction is catalysed by some product 
of another reaction (or an element of the food set). Moreover, recent experi- 
mental work has demonstrated at least the possibility (and viability) of such 
sets [1, 12, 23, 28, 29, 32]. It is also of interest to develop a mathematical 
framework that allows us to study the entire universe of possible self-sustaining 
autocatalytic sets, so that general results can be established, and predictions 
made. Here, we further explore one approach ('RAF theory') which has pro- 
vided a tractable and incisive tool for addressing computational and stochastic 
questions. 

RAF theory grew out of two strands: Stuart Kauffman's pioneering work 
on random autocatalytic networks from the 1970s and 1980s [19, 20, 21], and 
analysis of the first emergence of cycles in random directed graphs by Bol- 
lobas and Rasmussen [3] . Both of these earlier studies were explicitly motivated 
by origin-of-life considerations. The approach is related to, but different from 
chemical organisation theory (COT) [6, 7] and other formal approaches of a 
similar flavour, which include Petri nets [27], Rosen's (M; R) systems [18, 24], 
and Eigen and Schuster's hypercycle theory [10]. 

In earlier work [13] - [17], [25, 30] we have established a series of results 
concerning the structure, discovery and probability of the formation of RAF 
sets in a variety of catalytic reaction systems. When such a system contains a 
self-sustaining autocatalytic set (an 'RAF', defined below), this set can often be 
broken down into smaller RAFs until we arrive at the smallest 'building block' 
RAFs that cannot be broken down any further (c.f. [33]). In this paper, we 
investigate the structure of these irreducible RAFs, and bounds on the size of 
the smallest RAFs within a catalytic reaction system. 

Along the way, we derive some new facets of RAF theory, exploring further 
its relationship to COT, and the related weaker notions of pscudo-RAFs and 
co-RAFs, which can be co-opted by a RAF to form a larger RAF system. While 
it is easy to determine whether a chemical reaction system contains an RAF (in 
which case there is a unique largest one [13]), we prove that finding a smallest 
RAF is an NP-hard problem. Nevertheless, the structure of the smallest ('ir- 
reducible') RAFs allows us to present efficient algorithms to find lower bounds 
on their size, and to determine whether a given collection contains the smallest 
RAF in the system. 

We begin by recalling some definitions before proceeding to the combinatorial 
and algorithmic aspects of RAFs. We then apply mathematical arguments and 
simulations to study the size and distribution of irreducible RAFs in Kauffman's 
random binary polymer model [21] , and show that at a level of catalysis at which 
RAFs first form, small RAFs are highly unlikely. We end with a short discussion. 
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2. Definitions 

To formalize the notion of a chemical reaction system (CRS), the following 
basic notation and definitions are useful: 

• Let X = {x\,X2,X3, . . .} be a set of molecule types: each element Xi 
represents a different type of molecule. 

• Let F C X be a food set, containing molecule types that are assumed to 
be freely available in the environment. 

• Let r = ai + a 2 + ■ ■ ■ + a n — > bi + b 2 + . . . + b m be a chemical reaction, 
transforming a set of n reactants (molecule types oi, a 2 , . . . , a n ) into a 
set of m products (molecule types b\, b 2 , . ■ . , b m ). In principle there is no 
restriction on the number of reactants or products, although in the specific 
model we use (see below) n and m are at most two. 

• Let TZ — {ri, r 2 , . . . , rj;} be a set of (chemically possible) reactions. 

• Let p(r) and n(r) denote, respectively, the set of all reactants of r and the 
set of all products of r, and for any subset TZ' of TZ, let p(TZ') = U r erc' P( r ) 
and n(n') = \J ren , n(r). 

• Let C C {(x,r)\x e X, r e TZ} be a catalysis set, i.e., if the molecule- 
reaction pair (x,r) G C then molecule type x catalyses reaction r. 

A chemical reaction system (or, equivalently, a catalytic reaction system; 
CRS) is now defined as a tuple Q = {X, TZ, C} consisting of a set of molecule 
types, a set of (possible, or allowed) reactions, and a catalysis set. Based on [4], 
we can visualise a CRS as a reaction graph with two types of vertices (molecules 
and reactions) and two types of directed edges (from molecules to reactions and 
vice versa, and from catalysts to the reactions they catalyse). 

2.1. RAF sets 

Informally, a subset of reactions TZ' is an RAF (reflexively-autocatalytic and 
-F-generated) set if it satisfies the following property: 

Every reactant of every reaction in TZ' can be built up by starting from 
F and using just reactions in TZ' , and so that all reactions are eventually 
catalysed by at least one molecule that is either a product of some reaction 
in TZ' or is an element of F. 

To define an autocatalytic set more formally, we first need to define the notion 
of "closure" . Informally, the closure of a set of molecule types relative to a set 
of reactions, is the initial set of molecule types together with all the molecule 
types that can be created from it by repeated application of reactions from the 
given set of reactions. More formally, given a CRS Q — {X,TZ, C}, the closure 
cIk' (X') oi X' <Z X relative to TZ' C TZ is the (unique) minimal set W C X that 
contains X' and satisfies the condition that, for each reaction r = A — >• B e TZ' 
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Figure 1: A CRS for which the maxRAF consists of the set of three reactions {n, T2, r$). The 
only other RAF present is the irrRAF {r2,r^}. The singleton reaction {ri} is not an RAF 
(but it forms a co-RAF, defined later). 

(with A being a set of rcactants and B a set of products), A C W B C W. 
Notice that when 1Z' = the set c\n,'{X r ) is still defined, and it equals X'. 

Our mathematical definition of RAF sets is now as follows (note that this is 
the definition from [14], which is slightly modified from the original definition in 
[13]). Given a CRS Q = {X, 1Z, C} and a food set F C X, a non-empty subset 
TV C 1Z is said to be: 

• Reflexively autocatalytic if, for all reactions r <G TV ', there is at least one 
molecule type a; G cl-^'(F) such that (x,r) € C; 

• F -generated if p(ft') C cl TC /(F); 

• Reflexively autocatalytic and F '-generated (RAF) for (Q,F) if 1Z' is both 
reflexively autocatalytic and i^-generated. 

Because the union of RAFs for (Q,F) is also an RAF for (Q, F) it follows 
that any CRS that contains an RAF has a unique maximal RAF called the 
'maxRAF'; any other RAF is called a 'subRAF' of this maximal RAF. We say 
that an RAF is an irreducible RAF (or, more briefly, an 'irrRAF') if no proper 
subset is also an RAF. In contrast to the uniqueness of the maximal RAF, there 
may be many (indeed exponentially many) irrRAFs [15]. 

3. Characterising F-generated sets 

We have already defined the concept of beingF-generated, however, it will 
be useful to explore this further for the following reasons: 
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• to better understand the distinction between RAFs and 'pseudo-RAFs' 
(defined shortly); 

• to explain the link between F-generated sets and 'organisations' in chem- 
ical organisation theory; 

• to provide a characterisation that we will require later in the proof of our 
main stochastic theorem (Theorem 4). 

Given a CRS Q — (X,1Z,C) and a food set F, the closure set clft>(F) has 
two further equivalent descriptions. Firstly, it is the intersection of all subsets 
of X that contain F and that are closed relative to TV. It also has an explicit 
constructive definition as follows: cl^' (F) is the final set Wk in the sequence of 
nested sets F = Wo C W\ C • • • C Wk where Wi+\ is equal to the union of Wi 
and the set of products of reactions in TV whose reactants lie in Wi, and where 
K is the first value of i for which Wi — W i+ \. 

With this in hand, we now examine the definition of F-generated sets of 
reactions more closely. Recall from the earlier definitions that a subset of reac- 
tions TV is F-generated provided that every reactant of every reaction in TV lies 
in c\tz' (F) . Note that saying TV is F-generated implies but is strictly stronger 
than the condition that the reactant of each reaction in TV is cither a molecule 
in F or it is a product of another reaction in TV . F-generated is also strictly 
stronger than requiring that the molecules of X that are 'used up' in maintain- 
ing the reactions in TV is precisely F. An example that demonstrates both these 
strict containments is provided in Fig. 2 for the set TV = {ri,r 2 ,r 3 }, which is 
not F-generated (since cln'(F) = F). 

We now provide precise characterizations of when a set of reactions is F- 
generated. 




Figure 2: (a) The set 1Z' = {n, T2, r$\ of reactions is not F-generated, for F = {/i, /2, fs \ and 
X = FU{pi,p2,P3,P4}- (b) The expanded reaction set 1Z = {ri, T2, rz, r^} is .F-generated, for 
F = {/i, /2, /3, /4, /s}. The (unique) ordering that satisfies the conditions of Lemma 3. 1 (iii) 
(or (iv)) is r 4 ,r 2 ,r 3 ,r 1 . 
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Lemma 3.1. Given a CRS Q = (X,1Z,C), a food set F and a non-empty 
subset TZ' oflZ, the following are equivalent: 

(i) TZ' is F -generated. 

(ii) cl n ,(F) = FUir(K'). 

(Hi) TZ' has a linear ordering n, . . . , Tk so that the reactants ofr± are molecules 
in F , and for each i <G {2, . . . , k] the reactants of ri are contained in 
cl {rii ..., ri _ l} (F). 

(iv) TZ' has a linear ordering n, . . . ,Tk so that for each i e {1, . . . , k} each 
reactant of ri is either an element of F or is a product of some reaction 
rj where 1 < j < i. 

Proof: The equivalence (i) <S4> (ii) is from [13] (Lemma 4.3) and the equiva- 
lence (Hi) (iv) is easily verified, as the ordering of TZ that applies for either 
part, also works for the other (from the definitions). Thus, to establish this 
four- way equivalence, it suffices to show that (i) =>■ (Hi), and (Hi) => (i). 

To establish (i) => (Hi), suppose that 1Z' is F-generated. We construct an 
ordering satisfying (Hi) as follows: Let IZo denote the reactions in 1Z' that have 
their reactants in F, and for i > 0, let IZi denote the reactions in 1Z' that have 
their reactants in the set Wi — where Wi 7 i > is the sequence of nested 

sets described in the preamble to this lemma. Then take any ordering on 1Z' 
for which the reactions in IZi all come before IZi+i for i = 0, . . . , K — 1. This 
ordering satisfies the property described in part (Hi). 

To establish (Hi) (i), we only need to observe that cl{ ri; ... iT .._ 1 } (F) C 
cln'(F) for all i > 1, so if p(ri) is a subset of the first set, it is necessarily a 
subset of the second set. This completes the proof of Lemma 3.1. 

We now point out a consequence of this lemma that sheds some light on why 
the subset 1Z' in Fig. 2 fails to be .F-generated. Given a CRS Q = (X,TZ, C), a 
food set F and a subset TZ' of 1Z, consider the directed graph G(1Z') that has 
vertex set TZ' and an arc from r to r' precisely if there is a reactant x of r' that 
is a product of r and, in addition, if x £ cl K /_{ r }(i ;l ). This last condition states 
that molecule x cannot be built up from F using only the reactions in TZ' that 
do not include r. Note that a vertex of G(TZ') is permitted to have a loop (i.e. 
an arc from a reaction to itself). As an example of this graph, for the reactions 
shown in Fig. 2(a), G(TZ') is a directed three-cycle, while in part (b) of that 
figure, G(TZ') has no directed cycle. 

Theorem 1. Given a CRS Q = (X,TZ,C), a food set F, a non-empty subset 
TZ' ofTZ is F-generated if and only if the following two conditions hold: 

(a) every reactant of a reaction in TZ' is either an element of F or is a product 
of some reaction in TZ' ; and 

(b) the graph G(TZ') has no directed cycle (including loops). 
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Proof: Suppose that TV is F-generated. Then condition (a) in the theorem 
follows by part (ii) of Lemma 3.1; moreover, there exists an ordering n, . . . , 
of TV that satisfies the condition described in part (iii) of that lemma. Now, if 
(rt,rj) is an arc in G(1Z'), we must have i < j, since otherwise, if i > j, part 
(iii) of Lemma 3.1 gives: 

p{r-j) C cl {rii ... >r ._ l} (F) C cl R ,_ {r , } (F), 

and the containment p(rj) C cl-R/_{ r .}(F) would preclude the arc (ri,rj) from 
G(1Z'). So, if G(1Z') had a directed cycle (£1,12), (i2,h), ■ ■ ■ , (ir,h), we would 
have: i\ < ii < . . . < i\, a contradiction. Thus if TV is ^-generated, condition 
(b) in the theorem also holds. 

Conversely, suppose that TV satisfies conditions (a) and (b). We first show 
that there exists a reaction r* £ TV that has all its reactants in F, i.e. p(r*) C 
F. Suppose to the contrary that this were not the case (we will show this 
contradicts condition (b)). Then for every reaction r in TV , we can select a 
molecule x(r) £ F that is a reactant of r. Moreover, by property (a) and the 
condition that x(r) F it follows that x — x(r) is the product of some other 
reaction, which we will write as r'(x). Thus, starting with any given reaction, 
r , consider the alternating sequence of molecules and reactions (xi,ri),i > 
that we generate from ro by setting Xi = x{r{) and r^+i = r'(xi). Since TV 
is finite, this sequence must have r k = r t for some < k < I. Moreover, we 
cannot have Xi £ c\ nl _^ r . +1 j(F) for all i € [k,l — 1]; otherwise, in the graph 
G(1Z'), there would be an arc from r^+i to ri for all i £ [k,l — 1] and so we 
would obtain a directed cycle in G(1Z'), and by part (b), no such cycle exists. 
This contradiction ensures there exists some molecule Xi ^ F for i £ [k, I — 1] 
for which Xi £ cl7j/_{,.. +1 } (F). However, if the closure of F under any set of 
reactions contains a molecule outside of F, then some reaction in the collection 
must have all its reactants in F (by Lemma 3.1). This justifies our claim that 
there is a reaction r* £ TV with p(r*) C F. 

We now use induction on \1Z'\ to establish that conditions (a) and (b) imply 
that TV is F-generated. For \1Z'\ — 1 and the non-existence of a loop from this 
reaction to itself (by (b)), we see that TV is F-generated. Therefore suppose 
that the implication holds for any < n satisfying (a) and (b), and that we 
have \K'\ = n. Now, consider 11" = W - {r*} and F' = F U n(r*), where r* 
is the reaction in TV with p(r*) C F. Notice that TV' satisfies property (a). 
Moreover, we claim that property (b) also holds for TV' since if (r, r') is an arc 
of G(TZ") then it is also an arc of G(TV). To verify this, observe that if (r, r') is 
an arc of G(TZ") then there exists a reactant x of r' that is a product of r and 
for which x £ c\ n n_^{F'). However: 

c\ n „_ {r} {F') = c\ n ,_ {r ^ } {F') = c\ n ,_ {r} (F), 

and so x £ cl K /_{ r j(F), which implies that (r, r') is indeed an arc of G(TZ'). 
Consequently, the arcs of G(TZ") are a subset of the set of arcs of G(TZ') that 
do not contain r* and so G(TZ") cannot contain a directed cycle (or else G(TV) 
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would) . 

Thus, since 1Z" satisfies properties (a) and (b), it follows (by the induction 
hypothesis) that 1Z" is ^'-generated, and this implies that 1Z' is F-generated. 
This completes the proof of the converse result. 



4. Relationship with chemical organisation theory (COT) 

Chemical organisation theory (COT) [7] provides another way to study 
chemical reaction systems, and the concept of a (chemical) organisation shares 
two key properties with RAFs: closure and self-maintenance (for precise defi- 
nitions, see [7], and for recent relevant results, see [6] and [22]). However, the 
latter concept ('self maintenance') is defined somewhat differently: while RAFs 
require the property of being F-generated, an organisation is defined as self- 
sustaining under chemical dynamics, as encoded by the stoichiometric matrix. 
More precisely, if S is the stoichiometric matrix for the system, with its rows in- 
dexed by molecules and its columns by reactions, then self-maintenance requires 
a column vector v with strictly positive coordinates for which: 

Sv > (1) 

In words, this is the condition that the reactions can proceed at positive rates, 
so that the net rate of production of each molecule in the system is not less 
than the rate at which it is used up (otherwise such a molecule would disappear 
from the system). This is a weaker requirement than being ^-generated, since 
self-maintenance requires only that the system be self-sustaining once it exists, 
but does not address the question of whether the system could form in the first 
place from a set of molecules in F; we describe an example to illustrate this 
shortly. 

A second difference is that organisations allow but do not explicitly require 
reactions to be catalysed, though an extension to allow this has been discussed 
recently in [6]. Note that RAFs easily extend to allow some reactions not to be 
catalysed by introducing a putative new element of F to act as a catalyst for 
any reactions that otherwise do not require catalysis. 

A third important difference is algorithmic and we will discuss this shortly 
(a further minor difference is that organisations are subsets of molecules, while 
an RAF is a subset of reactions and molecules). The following lemma shows 
that there is a close but not identical relationship between ^-generated sets and 
organisations; part (i) was discovered by [6]. 

Lemma 4.1. Given a CRS Q = (X,1Z,C) and food set F, consider the set 
TZp := {0 —>/:/ € F} of reactions that formally generate F without using 
other molecules in X . 

(i) IflZ' is F-generated then the set of molecules c\-ji>(F) forms an organisa- 
tion, for the reactions 1Z' U TZp . 
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(ii) It is possible for a set M of molecules to form an organisation for a set of 
reactions TZ' U TZf but for TZ' to fail to be F -generated. 



Part (i) of the lemma was established in [6] (Corollary 1). Here, we show 
how it also follows as a consequence of Lemma 3.1. Firstly, if TZ' is ^-generated, 
then it is closed by the implication (i) (ii) in Lemma 3.1. Moreover, we 
may order the reactions in TZ' U TZf so that the reactions in TZf come first 
(in any order) and so that the order of the subsequent reactions from TZ' is 
such that the reactants of each reaction are either elements of F or products of 
earlier reactions - the existence of such an ordering for TZ' is provided by the 
implication (i) =>■ (iv) in Lemma 3.1. Consider the corresponding stoichiometric 
matrix S. Then the first non-zero element in each row of S is +1. Now for any 
real matrix with this last property, there is a strictly positive column vector v 
for which Sv > 0, since if S has c columns, and if the largest absolute value of 
any negative entry of S is b then we can take v to be the strictly positive vector 
that has its i-th coordinate given by: v c _j = (b + I) 1 for i = 0, . . . , c — 1. 

Part (ii) is established by considering the example shown in Fig. 2(a) with 
M = FU{p!,p2,P3,P4,} and ft' = {ri,r 2 ,r 3 }. Ordering M as /i, / 2 , fo, pi, P2,P3,P4 
and TZ' U TZf asM/i,8^/2,i-> fa, r\, r 2 , r 3 , we obtain the following 7x6 
stoichiometric matrix (rows are indexed by molecules; columns, by reactions): 



It is now clear that Sv = [0, 0, 0, 0, 0, 0, 1] T for the strictly positive vector 
v = [1, 1, 1, 1, 1, 1] T and therefore the self-maintenance inequality (1) holds. 
Since M is closed relative to the six reactions, it follows that M forms an 
organisation, but TZ' fails to be F-generated, since cln>(F) = F. This completes 
the proof. 

A further difference between COT and RAF theory is that determining 
whether or not a CRS contains a non-empty organisation is an NP-complete 
problem (c./. [5], Section 6.2), while determining whether there exists an RAF 
(necessarily non-empty) within any CRS can be decided by a polynomial time 
algorithm. We describe this now. 

4-1. The RAF algorithm and the map TZ' i->- s(TZ') 

The usual RAF algorithm ([13, 14]) starts with the full set of reactions 
and iteratively prunes out reactions until the set stabilises. For completeness, 
we describe this explicitly now. Given a CRS Q = (A, TZ, C) and a food set 
F, define the following nested (decreasing) sequence of subsets of reactions 
TZq,1Z\, . . . , Rk as follows: 



S = 



1 

1 

1 











1 




1 



1 
1 





1 



1 
1 









-1 



-1 
1 
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• TZq = TZ; and for i > 0, 

• IZi+i = {r e IZi : r has all its reactants and at least one catalyst in cI-r^F)}; 

• K is the first value i for which TZi = TZ i+ \. 

It can be shown that if TZk = then TZ contains no RAF; otherwise, TZk 
is the unique maximal RAF contained in TZ (for further details, see [13, 14]). 
Throughout this paper, we will let s(1Z') denote the terminal set (TZk) obtained 
by applying this process to an arbitrary subset TZ' of TZ. 

5. Pseudo-RAFs and co-RAFs 

Note that an RAF TZ' for (Q,F) satisfies the following two properties: 

(i) Every reaction in TZ' is catalysed by the product of another reaction from 
TZ' or by an element of F; and 

(ii) Each reactant of every reaction in TZ' is either an element in F or a product 
of another reaction in TZ' . 

We will call any subset TZ' of TZ that is non-empty and that satisfies properties 
(i) and (ii) a pseudo-RAF for (Q,F). Not every pseudo-RAF is an RAF, as the 
example in Fig 3) shows. However, pseudo-RAFs satisfy some of the properties 
of RAFs; in particular, the union of two or more pseudo-RAFs for (Q, F) is 
a pseudo-RAF for (Q, F). It follows that any pair (Q,F) either contains no 
pseudo-RAF (in which case (Q,F) contains no RAF either) or (Q, F) has a 
unique maximal pseudo-RAF that contains all other pseudo-RAFs of (Q, F) as 
well as the unique maximal RAF for (Q,F). 



An analogous algorithm to the RAF algorithm applies for constructing the 
maximal pseudo-RAF (when it exists), the only change being that cl-^^F) is 




Figure 3: A pseudo-RAF which fails to be an RAF 



10 



replaced by F U ir(TZi) in the construction of Hi+i from TZi (where ir(TZi) is the 
set of products of reactions in IZi). 

The RAF and pseudo-RAF algorithms have a similar flavour to the 'unit 
propagation' method of solving the propositional logic problem 'HORN-SAT', 
and in [16] we showed that HORN-SAT can be solved by an extension of the 
RAF algorithm. 

5.1. co-RAFs 

Although a pseudo-RAF cannot become established by itself (since it is 
not F-generated) , it can nevertheless become established in the presence of 
another RAF. This property is not unique to pseudo-RAFs, and we formalise 
and investigate this notion as follows. 

Given a CRS Q = (X, TZ, C) and a food set F, we will say that a subset TV 
of TZ is a co-RAF for (Q, F) if TV is a non-empty set for which there exists some 
RAF IZi for Q, which is disjoint from TV and whose union with TZ', TZi U TV ', 
forms an RAF for Q. 

A simple example of a co-RAF is the set {ri} in Fig. 1. Informally, a co-RAF 
is a system that may not have enough structure to form an RAF by itself, but 
which another (disjoint) RAF can co-opt to form a larger RAF. Note that a 
co-RAF may fail to be an RAF because either a reactant or a catalyst (or both) 
can fail to be in the closure of F; in either case, TZ 1 can provide the missing 
F-generated reactant or catalyst. 

The relationship between an RAF TZi and an associated co-RAF TV is similar 
to the relationship between a 'viable core' and an associated 'periphery' in [33]. 
The requirement that TV and TZi are disjoint in the definition of a co-RAF is not 
a serious restriction, since if TV U TZi is an RAF for (Q,F), where TV overlaps 
(but is not strictly contained within) an RAF TZi, then TV — (TZ' HTZi) is a 
co-RAF for (Q, F). 

Determining whether a given subset TV of TZ is a co-RAF for ( Q, F) can be 
solved in polynomial time by virtue of the following result (the equivalence of 
parts (i) and (ii)). We also give two other alternative descriptions of co-RAFs. 
The proofs of these results are presented in the Appendix. 

Proposition 5.1. Given a CRS Q = (X, TZ, C) and a food set F, let TZ' be a 

non-empty subset ofTZ. The following are equivalent: 

(i) TZ' is a co-RAF for (Q, F); 

(ii) s(TZ -TZ')^$ and TZ' U s(TZ - TV) is an RAF for (Q, F); 

(lit) TV = TZ B - TZa for two RAFs TZ A ,TZ B for (Q,F), where TZ A C TZ B ; 

(iv) TZ' is an RAF for (Q, F') where F' = F U tt(TZi), for some RAF TZi for 
(Q, F) that is disjoint from TZ' . 

Note that the equivalence (i) (m) provides a simple way to generate co- 
RAFs: any non-maximal RAF TZa for Q has a co-RAF; simply let TZ B be the 
maximal RAF, and take TV = TZ B — TZa- 
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6. Minimal RAFs and irrRAFs 

Given a CRS Q = (X, TZ, C) and a food set F, we can find an irrRAF 
efficiently (i.e. in polynomial time), but finding a minimal-sized RAF is much 
harder - we will show it is an NP-hard problem even to determine this minimal 
size. Nevertheless, it is possible to test whether a given irrRAF for (Q,F) is 
the only irrRAF for (Q,F) - and if it is, then it is necessarily a minimal sized 
RAF. More generally, if we generate irrRAFs for (Q,F) (each in polynomial 
time), and have found only a relatively small number of them (e.g. < 10 or so) 
then it is possible to test whether these are the only irrRAFs for (Q, F) and, if 
so, the one(s) of smallest size are the minimal-sized RAFs for (Q,F). We now 
show how this can be solved efficiently (in polynomial time), provided that we 
bound the number of irrRAFs. 

6.1. Do we have all the irrRAFs? 

Suppose that a CRS (X, TZ, C) with a food set F C X, has an RAF. Let 
1Z\, TZ 2 , • • • TZ k be a collection of distinct irrRAFs that have been constructed 
from this RAF (e.g. by our search algorithm). We would like to be able to 
determine whether these are all the irrRAFs for (X,TZ,C,F). The following 
result provides a way to do this for moderate values of k. Recall that for a 
subset TZ' of TZ, s(lZ') is the result of applying the RAF algorithm to 1Z'. 

Theorem 2. Suppose that a CRS (X,1Z,C) with a food set F, has an RAF. 
Then a collection TZ\, . . . ,TZ k of distinct irrRAFs constitutes the set of all the 
irrRAFs for (X, 1Z, C, F) if and only if the following condition holds: 

For all (ri,r 2 , . . . , r k ) G TZ\ x TZ 2 x • • • xTZ k , we have s(TZ — {ri,r 2 , ■ ■ ■ ,r k }) = 0. 

Proof: Suppose first that for some (n, r 2 , . . . , r k ) G TZ\ x TZ 2 x • • • x TZ k we 
have s(lZ — {n, r 2 , • • • , rk}) ^ 0. Then s(lZ — {r\,r 2 , ■ • ■ , r k }) is an RAF and so 
it contains at least one irrRAF, say TV. Since 

TV C s(TZ - {n,r 2 , . . . ,r fe }) C TZ- {n,r 2 , . . . ,r k }, 

TV cannot equal TZi for any i, since TV does not contain rj, but TZi does. Thus, 
TZ\ , ... , TZk does not constitute the set of all irrRAFs of (A, TZ, C, F). 

Conversely, suppose that TZi, . . . , TZk is not the set of all irrRAFs. Let TV be 
any other irrRAF. Then TZi is not strictly contained within TV for any i because 
otherwise TZ' would not be an irrRAF. Thus for each i, there exists some reaction 
ri G TZi — TZ' and thus a sequence (n, r 2 , . . . , r k ) € TZ\ x TZ 2 x • • • x TZk- Now 
consider s(TZ — {n, r 2 , . . . , rk})- Since TZ' is a subset of TZ — {ri, r 2 , . . . , rk}), it 
follows that 

TZ' = s(TZ')Cs(TZ-{ ri ,r 2 ,...,r k }), 
and so s(TZ — {ri,r 2 , . . . , r k }) ^ 0. This completes the proof. 

Remark: For any given value of k, determining whether or not we have 
all the irrRAFs can be solved in polynomial time (in the size of the CRS). Of 
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course, the exponent in the polynomial involves k, so it would also be interesting 
to see if this exponential dependency on k can be removed (and, if not, whether 
the problem is fixed parameter tractable in k). 

6.2. Finding a smallest RAF is hard 

Given a CRS and a food set (Q, F), finding a largest RAF can be solved 
by a polynomial time algorithm. This raises an obvious question: is there an 
efficient way to find the smallest RAF for (Q,F), or at least to calculate its 
size? A related question replaces 'smallest RAF' with 'smallest irrRAF', but it 
is clear that any smallest RAF must also be irreducible so the two questions are 
equivalent. Consider then the decision problem: 

MIN-RAF 

INSTANCE: A catalytic reactions system and food set (X,1Z, C, F), and a 
positive integer k. 

QUESTION: Does 1Z contain a subset of size at most k that forms an RAF 
for (X,H,C,F)1 

Theorem 3. 

(i) The decision problem MIN-RAF is NP-complete. 

(ii) Counting the number of sub-RAFs (or smallest sub-RAFs) of an arbitrary 
RAF is #P-complete. 

The proof of this theorem involves a reduction of MIN-RAF to the graph 
theory problem VERTEX COVER, by associating with each CRS a graph that 
has its vertex covers of size K in one-to-one correspondence with the sub-RAFs 
of the CRS of size if+constant. The details of the construction and the full 
proof of Theorem 3 are provided in the Appendix. 

6.3. Lower bounds on the size of RAFs 

In the light of Theorem 3, an interesting question is whether we can efficiently 
compute lower bounds on the size of an RAF. The first lower bound is easily 
computed. 

Lemma 6.1. Consider a catalytic reaction system Q and a food set F. Let 

K a = {reK: s(K - {r}) = 0}. 
Then every RAF for (Q,F) has size at least \1Zo\. 

Proof: Let TV be an RAF for (Q,F). Suppose that r <= K - If r e K - TV 
then TV = s(TZ') C s(lZ — {r}) = 0, which is not possible, since an RAF is 
non-empty, by definition. Thus, r E 1Z' . Since this holds for all r € TV it follows 
that 72.Q C TV, and so \TZq\ < l^'l- This completes the proof. 
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Part (ii) of the following Lemma provides a further computable lower bound 
on the smallest RAF, if we require the RAF to have the additional property 
that none of its reactions are catalysed by a food molecule. Given a CRS Q, 
let G' {TV) be the graph with the vertex set TZ' and with an arc from reaction r 
to reaction r' precisely if some product of r is a catalyst of r'. Part (i) of the 
lemma is essentially the 'Loop Theorem' of [6] (Theorem 2). 

Lemma 6.2. Consider a catalytic reaction system Q = (X,TZ,C) and a food 
set F. 

(i) If TV is an RAF for (Q,F) and no reaction in TV is catalysed by any food 
molecule then G'(TV) contains a directed cycle. 

(ii) Provided that s(K) ^ (i.e. (Q,F) has an RAF), the smallest RAF for 
{Q,F) for which no reaction is catalysed by a food molecule is at least as 
large as the length of the shortest directed cycle in G'(s(TV)), and this can 
be computed in polynomial time in the size of Q. 

Proof: Part (i): A classic, elementary result (c.f [2] Proposition 1.4.2) states 
that any digraph that has no vertex of in-degree must have a directed cycle. 
Now if r € TV then r has a catalyst in cl^/ (F) and so this catalyst is either the 
product of some reaction in TV or it is in F . However, the latter possibility is 
ruled out by the stated assumption concerning TV . Thus each vertex of G'(TZ') 
has positive in-degree and so this digraph has a directed cycle. 

Part (ii): Suppose TV is the smallest RAF for (Q,F). From part (i) TV con- 
tains a directed cycle of some length fc, so \R'\ > k. Moreover, since TV C s(TZ), 
k is at least the size of the smallest directed cycle in G'(s(TZ)), as claimed. More- 
over, since s(TZ) can be computed in polynomial time (by the RAF algorithm 
from Section 4.1), and thus G'(s(TVj) can be also, one can find the shortest 
directed cycle in this graph by an application of the Floyd- Warshall algorithm, 
or via Dijkstra's algorithm (see, for example, [2]). 

7. Minimal RAFs in the binary polymer model 

To investigate the issue of the smallest RAFs empirically, we used the binary 
polymer model to collect statistics on the sizes of RAF and irrRAF sets. This 
model has all binary sequences of length at most n as its molecules, and the 
reactions consist of ligation reactions (joining two sequences to form a longer 
sequence), together with the reversal of this operation (cleavage reactions, in 
which a sequence is split into two subsequences). Examples of ligation and 
cleavage reactions are 0101 + 001 — > 0101001 and 11110 -> 111 + 10, respectively. 

In this model, a ligation reaction and its associated cleavage reaction are 
often regarded as the same (reversible) 'cleavage-ligation' reaction. We let TZ = 
Ti n denote this set of cleavage-ligation reactions, and for a subset TV of TZ n , the 
set ir(TV) will be taken to be the set of of products of the cleavage and ligation 
reactions associated with TZ' (and so the closure of F relative to TZ' is the closure 
of F relative to the union of the associated cleavage and ligation reactions). 
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In the simplest form of this model, each molecule x catalyses any given 
cleavage-ligation reaction r independently with probability p = p n , which de- 
pends on n. The food set F is usually chosen to be all binary sequences of 
length at most t for a small value of t (typically, t = 2, in which case \F\ = 6). 

In previous work, we already studied how the probability of RAF sets exist- 
ing in this model scales with the value of n (the maximum length of molecules). 
Here, we simply chose one value (n = 10) and computed the sizes of RAF sets 
for various values of p (the probability that a given molecule catalyses a given 
cleavage-ligation reaction) or, equivalently, the level of catalysis / = p\lZ\ (the 
average number of reactions catalysed per molecule). 

Fig. 4 shows the average sizes of RAF sets (black squares) and irrRAF sets 
(crosses) for increasing levels of catalysis. These data points are averages over 
1000 instances of the model for each value of p. When the level of catalysis 
is too low (/ < 1.20), no RAF sets are found at all, i.e., their sizes are equal 
to zero. However, at a level of catalysis just above / = 1.20, the first RAF 
sets are starting to show up. Initially, they are found in only 6 out of 1000 
model instances, but with increasing levels of catalysis /, they become more 
and more frequent, and their sizes seem to increase linearly with /. In contrast, 
the average size of irrRAFs remains constant (for each non-empty RAF set, one 
(arbitrary) irrRAF set was generated) as the rate of catalysis increases across 
this narrow interval. 



+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + 
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Figure 4: The average sizes of RAF and irrRAF sets for increasing levels of catalysis for n = 10 
in the binary polymer model. 
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An interesting feature of Fig. 4 is that the sizes of the RAF sets when they 
first start appearing (around / = 1.20) are already quite large: 1222 reactions on 
average in the six RAF sets and 624 reactions in the corresponding six irrRAF 
sets (with \1Z\ — 16388 for the full reaction set). So, it seems there are no 
"small" RAFs when they are only just starting to appear. This observation is 
formalised in the following theorem, which shows that at the catalysis levels at 
which RAFs have a moderate probability of occurring, the smallest RAFs have 
a size that grows exponentially with n. 

Theorem 4 (Threshold catalysis RAFs have exponential size in n). 

Consider the binary polymer model Q n for sequences up to length n. Select any 
fixed value v < 1 and then select the catalysis probability p — p n so that 

Pr(3 RAF for Q n ) = v. 

Then, for any constant c < | : 

Pr(3 RAF 1Z' for Q n : \1Z'\ < 2 cn ) -> 0, 

flSfl^OO. 

Proof: For any subset 1Z' of 1Z n with s = \F U n(1Z')\ we have \cl n >(F)\ < s, 
and so the probability that an arbitrary reaction r E TZ' is catalysed by at 
least one clement of c\-ji'(F) is at most 1 — (1 — p n ) s . Consequently, if, in 
addition, TZ' has size k, the probability that TZ' is reflexively autocatalytic is at 
most (1 - (1 -p n ) s ) k - Now, we can provide a further upper bound on this last 
probability by an expression that involves just k (and not s) by observing that: 

(1 - (1 - Pn ) s ) k < (sp n ) k < [(3k + \F\) Pn ] k , (2) 

by noting that s < 3^+1^1, since each reaction in 1Z' is associated with at most 
three distinct molecules. 

In summary, the probability that any subset 1Z' of lZ n of size k is reflexively 
autocatalytic is, at most: 

m+\F\) Pn ] k . (3) 

Let S n> k be the number of subsets of lZ n of size k that are F-generated. Boole's 
inequality, combined with the upper bound (3), implies that the probability that 
Q n has an RAF of size k is bounded above by: S n ^ ■ [(3k + \F\)p n ] k . Thus: 

m 

Pr(3 RAF 1Z' for Q n : \K'\ < m) < S n , k ■ [(3k + \F\)p n ] k . 

fc=i 

Now, the value of p n for which Pr(3 RAF for Q n ) = v is bounded above by 
\ v n/\lZ n \ for some value A„ dependent only on v (by [25] [Theorem 4.1], and 
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[13] [Proposition 8.1]). Thus: 

m 

Pr{3 RAF n' for Q n : \TZ'\ < m) < S n ,k ■ [(3fc + \F\)\ v n/\K n \] k . (4) 

fe=i 

Now, by Lemma 3.1, any set of reactions is F-generated if and only if the 
reactions can be linearly ordered so that every reaction in the sequence has its 
reactants provided either from F or from the products of earlier reactions in the 
sequence (or both). 

Therefore, S n ^ is bounded above by the collection of ordered sequences 
n, T2, ■ ■ ■ , Tk where, for all j : < j < k: 

(*) r j+i i s a cleavage or ligation reaction involving one or two (respectively) 
molecules of X 3 := F U 7r({ri, . . . ,rj}) (taking X — F). 

Now, each reaction in the sequence r\, r%, . . . , creates, at most, two new 
molecules, and so < \Xj\ + 2 for all j. Since X n = F, we have for all 

< j < k - 1: 

\Xj\<\F\+2j, (5) 

Now, given n, . . . , rj (where j < k), the number of possible choices for r^+i to 
satisfy condition (*) above is, at most: 

\X 3 \ 2 +n-\X 3 l 

since the first term in this sum is an upper bound on the number of possible 
ligation reactions, while the second term is an upper bound on the number of 
cleavage reactions. Combining this with (5) gives the following upper bound on 
the number of sequences ri,r2, . . . , fk satisfying (*). 

k— 1 

JJ [{\F\ + 2j) 2 + n(\F\ + 2j)] < [(\F\ + 2k){n + \F\ + 2k)] k < (n+ \F\ +2kf\ 

j=o 

and so 

S n ,k < (n+\F\+2k) 2k . 

Applying this inequality to (4), with the asymptotic equivalence \lZ n \ ~ n2 n+1 , 
gives: 

m 

Pr(3 RAF W for Q n : \K'\ < m) < ^[(3fc + |F|)A„(n+| J F 1 |+2fc) 2 /2" +1 ] fe . (6) 

k=i 

Notice that we can provide an upper bound for the term on the right by the 
expression: 

oo 

]T[(3m + |F|)A,(n + \F\ + 2m) 2 /2 n+1 ] k = 9/(1 - 9), 
k=i 
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where 6 = [(3m + \F\)X v (n + |F| + 2m) 2 /2™ +1 ]. It follows that if m < 2 cn for 
c < g, then 8 (and thereby 8/(1 — 9)) converges to zero as n — > 00, and therefore 
so too does the expression for the probability in (6). This completes the proof. 
Comments 

• This result is interesting in the light of Theorem 11 of [3], as the proba- 
bility that the length of a first cycle is k when a first cycle appears in a 
random digraph is l/fc(fc + l) + o(l), and so short cycles have considerable 
probability in that model. 

By contrast, when the first RAFs appear, there are no small ones, since 
any RAF requires the simultaneous satisfying of two properties: it must 
be reflexively autocatalytic and also i^-generated; the former property is 
equivalent to the existence of a directed cycle in the catalysis graph (at 
least in the case p(x, r) = for x € F); while there might be a small cycle, 
it is unlikely to be ^-generated. 

• Theorem 4 provides an interesting complement to the earlier Theorem 3, 
which showed that there is, in general, no efficient way to determine the 
size of the smallest RAF in a CRS. Thus, it could be difficult to exclude 
the possibility a small RAF in the binary polymer model for large values 
of n, by searching for the smallest irrRAFs. However, Theorem 4 provides 
a theoretical guarantee that, with high probability, there will be no small 
RAFs when they first appear within this model. 

• The final inequality in the proof of Theorem 4 allows us to place explicit 
bounds on the likely minimal size of RAFs for finite values of n. For 
example, for n = 40, the probability that there exists an RAF of size 1000 
when the existence of an RAFs has a probability of 0.5 is less that 0.01 
(taking \F\ = 6 and the conservative value for A„ of 1.7 from Theorem 
4.1(h) of [25]). 

• It is easy to show that when the rate of catalysis becomes sufficiently large, 
we will expect to find small RAFs in the binary polymer model. Thus the 
initially largely flat line for irrRAF sizes in Fig. 4 must eventually decrease 
to small values (in the limit of size 1) as the rate of catalysis continues 
to increase. Moreover, small catalytic reaction systems (of size 16) that 
form RAFs (and which contain even smaller RAFs) have recently been 
discovered in real RNA replicator systems [32] . That such small sets form 
RAFs can be partly explained by the high catalysis rate [17]. 

7.1. Distribution of irrRAF sizes 

With Theorem 3 above, we proved that finding the smallest (irr)RAF set is 
a hard problem, so we cannot hope to have a polynomial time algorithm to do 
this. However, it is still possible to get an idea of the distribution of the sizes of 
the irrRAF sets that exist inside an RAF set. This can be done as follows. In 
[13], we described a polynomial time algorithm for finding one possible irrRAF 
in a given RAF 1Z' by removing one reaction r, from TV and applying the RAF 
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algorithm to the set TV — {ri}. If this results in an empty set (s(TV — {r,}) = 0), 
then reaction rj is essential and needs to remain in TV . Otherwise, replace TV 
by the non-empty subRAF s(TZ' — {ri}). Now repeat this procedure with every 
next reaction in TV until all reactions have been considered. The result of this 
is an irrRAF of TV . This algorithm was used to generate the data on irrRAF 
sizes in Fig. 4. 




590 600 610 620 630 640 650 580 600 620 640 660 680 700 

irrRAF size irrRAF size 



Figure 5: Histograms of the sizes of 1000 irrRAFs in two RAF sets when they first start to 
appear in the binary polymer model. 

Note that the particular irrRAF in TV that is found by this algorithm de- 
pends on the order in which the reactions £ TV are considered for possible 
removal. So, by repeating the above algorithm a number of times and randomly 
re-ordering the reactions in TV each time, we can generate a sample of irrRAFs 
of TV. Fig. 5 shows two histograms of the sizes of 1000 irrRAFs generated this 
way from two of the RAF sets that were found at a level of catalysis of about 
/ = 1.20, i.e., when RAF sets are just starting to show up. 

In both cases, the sample is dominated by one particular irrRAF size, with 
the rest being relatively close in size, although the histogram on the right shows a 
case where the smallest irrRAF is about 100 reactions smaller than the dominant 
one. Since this is only a random sample, there is no guarantee that this is indeed 
the smallest irrRAF. However, the fact that even the smallest irrRAF in these 
samples is still rather large (close to 600 reactions) is probably a good indication 
that, indeed, there are no small RAFs when they just start appearing. 

8. Concluding comments 

RAF theory provides a way to address one aspect of the complex question, 
how did life arise? The existence of RAFs does not represent a sufficient con- 
dition, but it would seem to be a necessary one. Moreover, the approach is 
sufficiently general that it can be applied to other emergence phenomena both 
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inside chemistry and in quite disparate fields (for an application to a 'toy' prob- 
lem in economics, see [16]). RAFs are based on two key ideas - every molecule 
must be able to be built up from the available set of 'food' molecules by reactions 
from the set, and each reaction must 'eventually' be catalysed. Here 'eventually' 
refers to the fact that some reactions may need to proceed uncatalysed (at a 
lower rate) in order to get the system going, but eventually, all reactions are 
catalysed. A stronger requirement would be that all reactions must be catalysed 
by the available molecules as the system develops (from the food molecules or 
products of reactions that have already occurred). This notion of a 'construc- 
tively autocatalytic ^-generated' (CAF) set from [25] seems an unnecessarily 
strong condition (since reactions can generally proceed, at a lower rate, without 
catalysis) and the mathematical properties of CAFs (and the probability that 
they form) are quite different from RAFs [25]. A weaker requirement is that 
only some reactions need to be catalysed - this fits perfectly easily within the 
current RAF framework, as we may simply formally allow a food molecule to 
act as a putative catalyst for those reactions. 

Another weakening of the RAF concept is to consider a closed chemical 
reaction system, which, once established, will continue to be self-maintaining. 
This underlies the notion of an 'organisation' in chemical organisation theory. 
The property of RAFs of being F-generated was shown in [6] to imply the 
property of being an organisation; we have shown here that the converse need not 
hold - in other words, an organisation may not be able to be built starting just 
with the food set, without the presence of some other reactant to get it started. 
This property of an organisation has a superficial similarity to the property 
that a RAF can allow one or more some reactions to proceed uncatalysed until 
the catalyst is formed. However, there is an important difference, since an 
uncatalysed reaction can proceed (at a lower rate), while this a reaction that 
lacks one of its reactants cannot take place. 

The focus of this paper has been on small RAFs, as these are, in some sense, 
the 'simplest' systems that could be of interest in origin-of-life studies. It is 
of interest to know whether within some CRS that harbours an RAF, there 
is a very small one present, or instead whether all subRAFs are quite large. 
The smallest RAFs are irreducible, though not all irreducible RAFs are of the 
smallest size. In contrast to the maximal RAFs, where there is a unique object 
(maxRAF) that can be constructed in polynomial time (by the RAF algorithm), 
there may be exponentially many irreducible RAFs, and finding a smallest RAF 
is, in general, NP-hard. Nevertheless, we can find irrRAFs in polynomial time, 
and we can describe computable lower bounds on the size of irrRAFs and also 
determine if a given (small) collection comprises all the irrRAFs. 

It is also of interest to consider the size and distribution of RAFs in simple 
settings such at the binary polymer model, where simulations suggest that when 
RAFs first appear, small irrRAFs are unlikely, a result that has been verified 
formally in Theorem 4. However, as the level of catalysis increases, one is 
guaranteed to eventually find small irrRAFs. 

An interesting problem for future work would be to develop better bounds 
and approximations for the minimal size of a RAF within a catalytic reaction 
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system. For example, is it possible to obtain a bound for the size the small- 
est RAF that is within some constant factor of optimal? It would also be of 
interest to investigate an extension of RAFs that allow some molecules to not 
only catalyse some reactions, but also to inhibit other reactions; in this case 
determining whether an analogue of an RAF exists within an arbitrary CRS 
has been shown to NP-hard [25], but in certain cases the RAF algorithm can 
be adapted to solve this problem [16]. 
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11. Appendix 

11.1. Proof of Proposition 5. 1 

To establish (ii) (i), suppose that s(ft - ft') ^ and that ft' U s(ft - ft') 
is an RAF for (Q, F). s(ft - ft') is an RAF for {Q',F), where Q' = {A, ft - 
W,C- C'} with C := {(x,r)\(x,r) G C,r G ft'}, and so is certainly an RAF 
for (Q, F). Furthermore ft' n s(ft - ft') = 0, since s(ft - ft') C ft - ft'. Hence 
ft' is a co-RAF for (Q, F). 

To establish (i) => (ii) suppose that ft' is a co-RAF for (Q, F). Then there 
exists an RAF Tl x for (Q, F), such that ft' n fti = and ft' U fti is an RAF for 
(Q,F). Consider s(ft - ft'). Since fti is an RAF for (Q,F) and is a subset of 
s(ft — 1Z'), we must have s(lZ — lZ') ^ 0. It remains to show that ft' U s(lZ — lZ') 
is an RAF for (Q, F). Suppose that r£K'U s(ft - ft'). Then either r € ft', 
in which case all the reactants of r and at least one catalyst are contained in 
cIr^ur, (F) (since ft' U Tl x is an RAF for (Q, F)), while if r G s(ft - ft') then 
all the reactants of r and at least one catalyst is contained in cl s ( K _ K /)(F) 
(since s(ft - TV) is an RAF for [Q,F)). Now, VJ U and s(1l - W) are 
both subsets of 1Z' U s(lZ — 7^'), and so cl-ji'uK! {F) and cl s (7j_-^') (F) are both 
subsets of cl7j/ Us (7j_7j/) (F). Consequently, every reaction in TV U s(TZ — 71') has 
all its reactants and at least one catalyst in cl K / Us ( K _ K /)(F), which implies that 
TV U s(7l - TV) is an RAF for (Q, F). 

To establish (Hi) => (i), note that TZa is an RAF for (Q,F) such that 
TV n TZ A = and TV U 7^ = 71 B which is an RAF for (Q,F). Therefore, TV is 
a co-RAF for (Q,F). 

To establish (i) (Hi), suppose that TV is a co-RAF for (Q, F). Then there 
exists an RAF 7£i for (Q, F), such that n TZ Y = and TV U is an RAF for 
(Q,F). Trivially, TV = (TV U Tlx) - Tli and clearly Tlx C ft' U fti, since ft' is 
non-empty by the definition of a co-RAF, so take Ha = fti and fts = ft' Ufti. 

To establish (i) =>■ (w), suppose that ft' is a co-RAF for (Q,F). Then 
there exists an RAF fti for (Q, F) such that fti n ft' = and fti U ft' is an 
RAF for (Q,F). It suffices to show that ft' is an RAF for (Q, F'), where F' = 
F U Tr(fti). First we prove that ft' is generated from F'\ i.e. p(ft') C c\ n ,(F'). 
Let m=|fti|,n=|ft'|. fti U ft' is F-generated, so there exists an ordering 
O u = Mi, . . . , Mm+n of its reactions Mi satisfying part (iv) of Lemma 3.1 for the 
food set F. We herein refer to an ordering satisfying part (iv) of Lemma 3.1 
for some food set F as a proper ordering relative to F. fti is F-generated so 
there exists a proper ordering relative to F, 0\ = r\, . . . ,r m , of its reactions 
r-j. Define O' = r' l7 . . . , r' n to be the ordering of the reactions of TV obtained by 
deleting from O u every reaction that also appears in Ox , preserving the order of 
the remaining reactions. We claim that the concatenation 0\,0' ', a reordering 
of O u , is a proper ordering relative to F. Consider any reaction r' G ft' and a 
reactant x € p(r'). fti U ft' is F-generated and ft' C fti U ft', so by part (ii) 
of Lemma 3.1 at least one of the following holds: (i) x € F, (ii) x € ir(r) for 
some r G fti, or (iii) x G 7r(r") for some r" G ft'. If (i) alone is true, r' trivially 
does not prevent the reordering from being a proper ordering relative to F. If 
(ii) alone is true, every r G fti precedes r' in 0\,0', so r' certainly does not 
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prevent the reordering from being a proper ordering relative to F. If (iii) alone 
is true, r" must precede r' in O u and the order of the reactions of TZ' in O u is 
preserved in Oi,O r , so r" precedes r' in 0\,0' . If more than one of (i)— (iii) are 
true, then since O u is a proper ordering, at least one of the conclusions will hold, 
which is sufficient. Therefore our claim that 0\,0' is a proper ordering relative 
to F is justified. It follows that p(r[) C F U n(lZi) 7 and for each i £ {2, . . . , n}, 
p( r i) C i r U7r(72.i)U7r({r 1 , . . . so moreover O' alone is a proper ordering 

relative to F' . Then, by the implication (iv) => (i) in Lemma 3.1, TZ' is generated 
from F' . It remains to show that TZ' is reflexively autocatalytic. Since TZ' and 
TZi U TZ' are F-generated, we can apply part (ii) of Lemma 3.1 to cln'(F') and 
&n^uw(F) to deduce that they are equal. Now, since TZi U 1Z' is reflexively 
autocatalytic then certainly 1Z' is reflexively autocatalytic (by definition). 

To establish (iv) => (i), it suffices to show that TZiUTZ' is an RAF for (Q, F), 
since we already have that TZi is an RAF for (Q, F) and IZi n 1Z' = 0. First 
we prove that TZi U 72' is F-generated. 72i is F-generated, so there exists a 
proper ordering relative to F of its reactions ri, . . . , r TO . Similarly for 72.' there 
exists a proper ordering relative to F U 7r(72i) of its reactions r[, . . . , r' n . Hence 
the concatenation n , . . . , r m , r[ , . . . , r' n is a proper ordering relative to F of the 
reactions in TZi UTZ' , so TZiUTZ' is F-generated. It remains to show that TZiUTZ' 
is reflexively autocatalytic. Since TZi,TZ' and TZi UTZ' are each F-generated, we 
can apply part (ii) of Lemma 3.1 to each of cl-^^F), cl K /(F') and cln lU ni(F) 
to deduce that cl^^F) C cIk'(F') = clu^w (F) ■ Now since 7?-i and TZ' are 
reflexively autocatalytic then certainly TZi UTZ' is reflexively autocatalytic. This 
completes the proof. 

11.2. Proof of Theorem 3 

Proof: MIN-RAF is clearly in the complexity class NP, since one can verify 
in polynomial time if a given subset of TZ has size, at most, k and forms an 
RAF. We will reduce the graph theory problem VERTEX COVER to MIN- 
RAF. Recall that for a graph G = (V,E), a vertex cover of G is a subset V of 
V with the property that each edge of G is incident with at least one vertex in 
V; VERTEX COVER has as its instance a graph G = (V,E) and an integer 
K and we ask whether or not G has a vertex cover of size, at most, K. This is 
a well-known NP-complete problem [11] (indeed, one of Karp's original 21 NP- 
completc problems). Given an instance (G = (V,E),K) of VERTEX COVER, 
we show how to construct an instance (Xq, TZg, Cq, Fq, k), of MIN-RAF for 
which the answers to the two decision problems arc identical. 

We first construct Fq and Xq. For each v € V, let a v ,b v be two distinct 
elements of F G and let x v be an element of X G — F G . Order E as e 1 , . . . , el E l 
and for each j — 1, . . . , |F|, let dj be a distinct element of F and yj an element 
of Xq — Fq- Let do be another distinct element of Fq- Thus Fq consists of the 
2\V\ + \E\ + 1 elements: 

F G := {dj : < j < \E\} U {a v , b v : v E V} 
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and Xq — Fq consists of \V\ + \E\ elements: 

X G - F G := {x v : v e V} U { Vj 
For each v GV, define a reaction: 



1<3<\E\}- 




Figure 6: (i) A graph G and (ii) the associated CRS Qg, consisting of 8 reactions that form 
an RAF, and with the super-catalyst (1/4) at the top. The two smallest sub-RAFs of this 
system are formed by adding either r a and r c or r;, and r c to the four reactions r[ , . . . , T4, and 
these two choice correspond to the two smallest vertex covers of G, namely {a, c} and {b, c}. 

For each 1 < j < \E\, define the reaction: 

r'j ■ Vj-i+dj yj, 

and for j = 1, let: 

r[ : d + di -> yi- 

For any subset U of V let: 

TZ-u — { r v ■ v GU}, and let 
K v := {r v :ve V} and K E := {rj : 1 < j < |S|}, 
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and set 

K G = K v UKe- 

Thus, we have specified X G , F g and TZ G and it remains to define the catalysis 
(C G ) assignment, which is as follows: 

• If e J = yi) (where u- 7 , w J G V) then r'j is catalysed by both x u j and 
x v j (but by no other molecules). 

• In addition, each reaction r v : v G V is catalysed by y\E\ and by no other 
molecule - we call the molecule y\E\ the super-catalyst. 

An example of this construction is illustrated in Fig. 6. We have now fully 
specified the catalysation and thereby the pair {Q Gl F G ) constructed from G 
(Q G = (X G ,K G ,C G )). 
Claims: 

• TZ G is an RAF for (Q G ,F G ). 

• A subset TZ' of TZ G is an RAF for (Q G , F G ) if and only if TZ 1 = TZy U TZ E 
for a vertex cover V' of G. 

• The vertex covers of G of size K are in one-to-one correspondence with 
the sub-RAFs of K G of size K + \E\. 

The first claim is readily verified. 

To establish the second claim, suppose that V is a vertex cover of G. Then 
every reaction in TZe is catalysed by the product of least one reaction in TZy- 
Moreover, the product of r', E , catalyses all the remaining reactions. Thus, TZ' 
is reflexively autocatalytic, and it is also clear that TZ' is F-generated; thus TZ' 
is an RAF and it has K + \E\ reactions. Conversely, suppose that TZ" is an 
RAF for (Q G ,F G ) of size at most K +\E\. If r' E is not in 1Z" then the super- 
catalyst is not produced by any reaction in 7Z" so none of the reactions in TZy 
is catalysed; moreover, because the products from these last reactions provide 
the only catalysts for TZe it follows that 7Z" = 0. Thus, since 1Z" is non-empty 
(being an RAF), r', E , must be an element of 1Z" , and in order to construct the 
reactants of r\ E \, all the reactions TZe must form a subset of 7Z" . In order for all 
these reactions to be catalysed, at least one of the reactions r u j and r v j must 
lie in 1Z" for each 1 < j < \E\. Thus {v : r v G 1Z"} is a vertex cover of G and it 
has size, at most, (K + \E\) — \E\ = K as claimed. This establishes the required 
reduction, and thereby completes the proof of the second claim. 

The third claim follows by the noting that the association V ^ TZy U TZe 
maps vertex covers of G of size K onto sub-RAFs of TZ G of size K + \E\ (by 
the previous claim) and two different vertex covers are mapped to distinct sub- 
RAFs. This completes the proof. 

Part (i) of Theorem 3 now follows from the first two claims, while Part (ii) 
of Theorem 3 follows from the third claim, combined with the #P-completeness 
of counting vertex covers of a graph and minimum vertex covers of a graph (see 
[31]). 
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Remark: We have ensured in the proof above that each reaction has just 
two reactants, in line with the binary polymer model. However, the attentive 
reader will notice that F may have to be quite large. Nevertheless, it is quite 
straightforward to modify this example so that F is kept small (e.g. of size 6), 
and to implement the construction within the constraints of the binary polymer 
cleavage-ligation model. 
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